Goto

Collaborating Authors

 Stara Zagora Province


ChronoFact: Timeline-based Temporal Fact Verification

Barik, Anab Maulana, Hsu, Wynne, Lee, Mong Li

arXiv.org Artificial Intelligence

Automated fact verification plays an essential role in fostering trust in the digital space. Despite the growing interest, the verification of temporal facts has not received much attention in the community. Temporal fact verification brings new challenges where cues of the temporal information need to be extracted and temporal reasoning involving various temporal aspects of the text must be applied. In this work, we propose an end-to-end solution for temporal fact verification that considers the temporal information in claims to obtain relevant evidence sentences and harness the power of large language model for temporal reasoning. Recognizing that temporal facts often involve events, we model these events in the claim and evidence sentences. We curate two temporal fact datasets to learn time-sensitive representations that encapsulate not only the semantic relationships among the events, but also their chronological proximity. This allows us to retrieve the top-k relevant evidence sentences and provide the context for a large language model to perform temporal reasoning and outputs whether a claim is supported or refuted by the retrieved evidence sentences. Experiment results demonstrate that the proposed approach significantly enhances the accuracy of temporal claim verification, thereby advancing current state-of-the-art in automated fact verification.


Knowledge Generation -- Variational Bayes on Knowledge Graphs

Wolf, Florian

arXiv.org Artificial Intelligence

This thesis is a proof of concept for the potential of Variational Auto-Encoder (VAE) on representation learning of real-world Knowledge Graphs (KG). Inspired by successful approaches to the generation of molecular graphs, we evaluate the capabilities of our model, the Relational Graph Variational Auto-Encoder (RGVAE). The impact of the modular hyperparameter choices, encoding through graph convolutions, graph matching and latent space prior, is compared. The RGVAE is first evaluated on link prediction. The mean reciprocal rank (MRR) scores on the two datasets FB15K-237 and WN18RR are compared to the embedding-based model DistMult. A variational DistMult and a RGVAE without latent space prior constraint are implemented as control models. The results show that between different settings, the RGVAE with relaxed latent space, scores highest on both datasets, yet does not outperform the DistMult. Further, we investigate the latent space in a twofold experiment: first, linear interpolation between the latent representation of two triples, then the exploration of each latent dimension in a $95\%$ confidence interval. Both interpolations show that the RGVAE learns to reconstruct the adjacency matrix but fails to disentangle. For the last experiment we introduce a new validation method for the FB15K-237 data set. The relation type-constrains of generated triples are filtered and matched with entity types. The observed rate of valid generated triples is insignificantly higher than the random threshold. All generated and valid triples are unseen. A comparison between different latent space priors, using the $\delta$-VAE method, reveals a decoder collapse. Finally we analyze the limiting factors of our approach compared to molecule generation and propose solutions for the decoder collapse and successful representation learning of multi-relational KGs.


Table-to-Text: Describing Table Region with Natural Language

Bao, Junwei, Tang, Duyu, Duan, Nan, Yan, Zhao, Lv, Yuanhua, Zhou, Ming, Zhao, Tiejun

arXiv.org Artificial Intelligence

In this paper, we present a generative model to generate a natural language sentence describing a table region, e.g., a row. The model maps a row from a table to a continuous vector and then generates a natural language sentence by leveraging the semantics of a table. To deal with rare words appearing in a table, we develop a flexible copying mechanism that selectively replicates contents from the table in the output sequence. Extensive experiments demonstrate the accuracy of the model and the power of the copying mechanism. On two synthetic datasets, WIKIBIO and SIMPLEQUESTIONS, our model improves the current state-of-the-art BLEU-4 score from 34.70 to 40.26 and from 33.32 to 39.12, respectively. Furthermore, we introduce an open-domain dataset WIKITABLETEXT including 13,318 explanatory sentences for 4,962 tables. Our model achieves a BLEU-4 score of 38.23, which outperforms template based and language model based approaches.


Table-to-Text: Describing Table Region With Natural Language

Bao, Junwei (Harbin Institute of Technology) | Tang, Duyu (Microsoft Research) | Duan, Nan (Microsoft Research) | Yan, Zhao (Beihang University) | Lv, Yuanhua (Microsoft AI and Research) | Zhou, Ming (Microsoft Research) | Zhao, Tiejun (Harbin Institute of Technology)

AAAI Conferences

In this paper, we present a generative model to generate a natural language sentence describing a table region, e.g., a row. The model maps a row from a table to a continuous vector and then generates a natural language sentence by leveraging the semantics of a table. To deal with rare words appearing in a table, we develop a flexible copying mechanism that selectively replicates contents from the table in the output sequence. Extensive experiments demonstrate the accuracy of the model and the power of the copying mechanism. On two synthetic datasets, WIKIBIO and SIMPLEQUESTIONS, our model improves the current state-of-the-art BLEU-4 score from 34.70 to 40.26 and from 33.32 to 39.12, respectively. Furthermore, we introduce an open-domain dataset WIKITABLETEXT including 13,318 explanatory sentences for 4,962 tables. Our model achieves a BLEU-4 score of 38.23, which outperforms template based and language model based approaches.


Extending an Information Extraction tool set to Central and Eastern European languages

Ignat, Camelia, Pouliquen, Bruno, Ribeiro, Antonio, Steinberger, Ralf

arXiv.org Artificial Intelligence

In a highly multilingual and multicultural environment such as in the European Commission with soon over twenty official languages, there is an urgent need for text analysis tools that use minimal linguistic knowledge so that they can be adapted to many languages without much human effort. We are presenting two such Information Extraction tools that have already been adapted to various Western and Eastern European languages: one for the recognition of date expressions in text, and one for the detection of geographical place names and the visualisation of the results in geographical maps. An evaluation of the performance has produced very satisfying results.